Systems Biology Lab
2024-11-01
In the course Biosystems Data Analysis (P3) you will use R again extensively.
log(2.71): an expressionlog(): a function or method2.71: an argument of the function log()0.9969486: the output of the expression2.71, 0.9969486: are objects (of class numeric)1 + 2: an expression'+'(): a function or method1 and 2: are arguments of the function '+'()1 and 2: are objects (of class numeric)+ is a function with two arguments. It can be written as a normal function '+'(1,2) or as an infix operator 1 + 2.
Note that these are the basic classes. Packages may introduce new child classes of basic classes.
vector classvector is an abstract class in R, representing a tuple, a set of containers that carry an index number:211.14, a character “a”, a string “Abc”, …In an atomic vector:
Atom types are limited to three classes:
numeric: numberscharacter: character stringslogical: logical values (TRUE, FALSE)numeric vector: every element contains a single numberIn the expressions seq(1,5) and 1:5
character vector: each element holds a single character stringlogical vector: each element holds a single logical valuelogical vectors are usually calculated using logical functions like ==, !=, >, < etc.:
Use the concatenation function c() to create vectors with arbitrary data
In the expression c('a','b','sample')
How many arguments does the function c() take?
What is the class of the argument(s)?
What is, in general, the class of the output of the function c()?
How many arguments does the outer concatenation in c(c('a','b'),'sample') take? What will be the output?
<- creates a semi-permanent copy in memory [1] 0.321652743 0.255329920 0.813072467 0.844754566 0.741460668 0.452447160
[7] 0.660586335 0.884833407 0.376626137 0.396120389 0.203076562 0.613253285
[13] 0.755529126 0.082694459 0.340706577 0.261525724 0.256461629 0.236181747
[19] 0.477760768 0.450746441 0.475976889 0.247493265 0.983040848 0.634353817
[25] 0.516292815 0.764290652 0.625141596 0.249894288 0.612004598 0.850249073
[31] 0.996369231 0.276770240 0.553375643 0.425778985 0.279011318 0.575475608
[37] 0.547314094 0.776472423 0.786281534 0.458755063 0.392587437 0.984401637
[43] 0.952639495 0.609412803 0.075744548 0.792524322 0.748155368 0.870700320
[49] 0.610821505 0.919262931 0.037285194 0.157941364 0.565128270 0.973846421
[55] 0.337083640 0.220714435 0.775216294 0.264685093 0.834163542 0.709327108
[61] 0.073919139 0.942226410 0.581857634 0.656319579 0.583717612 0.630195420
[67] 0.287243839 0.524404053 0.693821546 0.727689154 0.184470702 0.592739058
[73] 0.511415237 0.135698926 0.551563792 0.117924483 0.231563725 0.218273374
[79] 0.393647147 0.320536121 0.083605645 0.269399869 0.679423194 0.937780983
[85] 0.871313597 0.787220405 0.879004461 0.470023791 0.702892255 0.409319881
[91] 0.632400250 0.077210962 0.356680507 0.142912854 0.976355729 0.033196826
[97] 0.001373154 0.676875147 0.427970793 0.778332553
Elements of any atomic type of vector can have the value NA: not available
NA is not the same as 0 or ""Operations on objects containing NA may or may not yield NA:
Inf, -Inf: resulting from division by 0 or from calculations beyond the numeric capacity of the computerNaN: resulting from numerical operations leading to an undefined numbersample() or sample.int()help() for these functionsfactor: each element holds a single discrete valuefactor: a vector with a limited vocabulary
array: a vector with an attributearray: consists of an atomic vector. Special case: matrix, a 2-dimensional array
data.frame: the equivalent of a tabledata.frame correspond to columnsdata.frame must all have equal length
Welch Two Sample t-test
data: weight by gender
t = -3.4717, df = 15.944, p-value = 0.00316
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
-17.014143 -4.110979
sample estimates:
mean in group Female mean in group Male
67.44078 78.00334
$ operator works with named indexes: [1] 72.93805 91.13514 73.61649 64.29421 84.09792 71.75040 81.99970 73.98459
[9] 82.68189 83.53499 68.73414 73.21086 66.80506 69.45581 66.43022 72.51171
[17] 60.32600 73.96749 65.76018 57.20630
[[() function works with numeric and named indexes: [1] 72.93805 91.13514 73.61649 64.29421 84.09792 71.75040 81.99970 73.98459
[9] 82.68189 83.53499 68.73414 73.21086 66.80506 69.45581 66.43022 72.51171
[17] 60.32600 73.96749 65.76018 57.20630
library(packagename)require(packagename)plot)plot(x=iris$Petal.Length, y=iris$Sepal.Length,
asp=1,
xlab="Petal length (cm)", ylab="Sepal length (cm)",
col=c('maroon','plum3','navyblue')[iris$Species],
pch = 19
)
lines(x=c(1,7),y=c(4.31+0.41*1, 4.31+0.41*7), lwd=2, col='red')
legend(1.2, 8,
pch=19,
col=c('maroon','plum3','navyblue'),
legend=levels(iris$Species))Good enough for “quick plots”
tidyverse wayggplot(data=iris, mapping=aes(x=Petal.Length, y=Sepal.Length, colour=Species)) +
labs(x='Petal length (cm)', y='Sepal length (cm)') +
stat_smooth(method=lm, se=FALSE, colour="red") +
geom_point(size=3) +
theme(legend.position=c(0.05,0.95), legend.justification=c(0,1)) +
scale_color_manual(values=c('maroon','plum3','navyblue'))Publication quality graphs
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.8 2.7 4.1 1 versicolor
2 6.4 2.8 5.6 2.1 virginica
3 4.4 3.2 1.3 0.2 setosa
.. .. .. .. .. ..
150 6.1 2.8 4 1.3 versicolor
means <- do.call(rbind, lapply(iris[,c("Sepal.Length", "Sepal.Width")], function(x) {tapply(x, iris$Species, mean)}))
sds <- do.call(rbind, lapply(iris[,c("Sepal.Length", "Sepal.Width")], function(x) {tapply(x, iris$Species, sd)}))
rownames(means) <- paste0(rownames(means), "_avg")
rownames(sds) <- paste0(rownames(sds), "_sd")
df <- as.data.frame(rbind(means,sds))
df setosa versicolor virginica
Sepal.Length_avg 5.0060000 5.9360000 6.5880000
Sepal.Width_avg 3.4280000 2.7700000 2.9740000
Sepal.Length_sd 0.3524897 0.5161711 0.6358796
Sepal.Width_sd 0.3790644 0.3137983 0.3224966
tidyverse waySpecies | Sepal.Length_avg | Sepal.Length_sd | Sepal.Width_avg | Sepal.Width_sd |
|---|---|---|---|---|
setosa | 5.006 | 0.3524897 | 3.428 | 0.3790644 |
versicolor | 5.936 | 0.5161711 | 2.770 | 0.3137983 |
virginica | 6.588 | 0.6358796 | 2.974 | 0.3224966 |
Document writing is integrated with code writing
Functions do things with the input
R contains many pre-defined functions, you saw mean, sd, sum, t.test
Looping